The purpose of this notebook is to analyze participant-level data collected for study SGC-3: The Insight Hypothesis

TODO QUESTIONS: 1. The main accuracy score is very much not a normal distribution. What is the most appropriate distribution to use to model these data? 2. Is the linear regression a good enough fit?

#IMPORT DATA from fall and spring files
fall_participants <- "data/fall17_sgc3a_participants.csv"
spring_partcipants <- "data/spring18_sgc3a_participants.csv"
online_participants <- "data/fall21_sgc3a_participants.csv"

df_fall <- read.csv(fall_participants)
df_spring <- read.csv(spring_partcipants)
df_online <- read.csv(online_participants)

#indicate study modality
df_fall$mode <- "lab"
df_spring$mode <- "lab"
df_online$mode <- "online"

#Create combined data frame
df_subjects <- rbind(df_fall, df_spring, df_online) #, df_replication)
df_subjects$tri_min <- df_subjects$triangular_time / 1000 / 60
df_subjects$test_min <- df_subjects$tt_t / 1000 / 60
df_subjects$learn_min <- df_subjects$ts_t / 1000 / 60

#Create factors 
df_subjects <- df_subjects %>% mutate(
  subject = as.factor(subject),
  session = as.factor(session),
  term = as.factor(term),
  condition = as.factor(condition),
  explicit = as.factor(explicit),
  impasse = as.factor(impasse),
  axis = as.factor(axis)
)

df_fall <- df_subjects %>% filter(term =="fall17")
df_spring <- df_subjects %>% filter(term =="spring18")
df_online <- df_subjects %>% filter(term =="fall21")
df_lab <- df_subjects %>% filter(term != "fall21")

INTRODUCTION

Data was initially collected (in person, SONA groups in computer lab) in Fall 2017. In Spring 2018, additional data were collected after small modifications were made to the experimental platform to increase the size of multiple-choice input buttons, and to add an additional free-response question following the main task block. In Fall 2021, the study was replicated using asynchronous, online SONA pool.

#MANUALLY INSPECT TERMS
df_subjects %>% group_by(term) %>% 
  dplyr::summarize(n=n())
## # A tibble: 3 × 2
##   term         n
##   <fct>    <int>
## 1 fall17      54
## 2 fall21     140
## 3 spring18    72

SGC3-A is a 2-condition (111-control VS 121-impasse) between-subjects experiment with 266 participants. control impasse

#MANUALLY INSPECT conditions
df_subjects %>% group_by(term,condition) %>% 
  summarize(n=n())
## # A tibble: 6 × 3
## # Groups:   term [3]
##   term     condition     n
##   <fct>    <fct>     <int>
## 1 fall17   111          27
## 2 fall17   121          27
## 3 fall21   111          68
## 4 fall21   121          72
## 5 spring18 111          35
## 6 spring18 121          37

DESCRIPTIVES

Participants

#Describe participants
subject.stats <- rbind(
  "lab"= df_lab %>% select(age) %>% unlist() %>% favstats(),
  "online" = df_online %>% select(age) %>% unlist() %>% favstats()
) 
subject.stats$female <- c(
  (df_lab %>% filter(sex=="Female") %>% count())$n,
  (df_online %>% filter(sex=="Female") %>% count())$n
)

For in-person collection, 126 participants (60 % female ) undergraduate STEM majors at a public American University participated in person in exchange for course credit (age: 18 - 33 years). Participants were randomly assigned to one of two experimental groups.

For online replication 140 participants (70 % female ) undergraduate STEM majors at a public American University participated online, asynchronously in exchange for course credit (age: 18 - 31 years). Participants were randomly assigned to one of two experimental groups.

Response Accuracy

Response accuracy refers to how many questions the subject answers with a correct (triangular) interpretation.

#DESCRIBE distribution of triangular-correct scores
score.stats <- rbind(
  "lab"= favstats(df_lab$triangular_score),
  "online"= favstats(df_online$triangular_score)
)
score.stats
##        min Q1 median Q3 max     mean       sd   n missing
## lab      1  2      3 11  15 5.809524 4.893611 126       0
## online   1  2      2  8  15 5.007143 4.738479 140       0


For in person collection, accuracy scores (n = 126) range from 1 to 15 with a mean score of (M = 5.81, SD = 4.89).

For online replication, (online) accuracy scores (n = 140) range from 1 to 15 with a mean score of (M = 5.01, SD = 4.74).

#VISUALIZE distribution of response accuracy
plab <- gf_histogram(~triangular_score, data = df_lab) %>%
  gf_vline(xintercept = score.stats["lab",]$mean, color = "black") +
  labs(title="Lab")

ponline <- gf_histogram(~triangular_score, data = df_online) %>%
  gf_vline(xintercept = score.stats["online",]$mean, color = "black") +
  labs(title="Online")

plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Study",
   color = "black", face = "bold", size = 14))


However, QQPlots reveal that this variable does not approximate a normal distribution.

qqPlot(df_lab$triangular_score)

## [1]  7 10
qqPlot(df_online$triangular_score)

## [1]  8 11


TODO: These distributions are clearly not normal. Is a linear model appropriate here, or should we be using a Poisson distribution? See the due dilligence.

Response Latency

TODO: Investigate super high and super low response times.. TODO: Investigate appropriate models for response time data. (see: https://lindeloev.github.io/shiny-rt/) .

#DESCRIBE distribution of response time
time.stats <- rbind(
  "lab"= favstats(df_lab$tri_min),
  "online"= favstats(df_online$tri_min)
)
time.stats <- time.stats %>% select(-missing) #don't need missing column
time.stats
##             min       Q1   median       Q3      max     mean       sd   n
## lab    3.770800 7.283617 8.804333 10.71089 19.98955 9.253020 2.902315 126
## online 1.983417 7.083704 8.811267 11.84312 27.87285 9.924459 4.527250 140


For in person response latency (for stimulus blocks) (n = 126) range from 3.77 to 19.99 with a mean score of (M = 9.25, SD = 2.9).

For online replication (online) accuracy scores (n = 140) range from 1.98 to 27.87 with a mean score of (M = 9.92, SD = 4.53).

#VISUALIZE distribution of response time
plab <- gf_dhistogram(~tri_min, data = df_lab) %>%
  gf_vline(xintercept = time.stats["lab",]$mean, color = "black") %>% 
  gf_fitdistr(color="red")+
  labs(title="Lab")

ponline <- gf_dhistogram(~tri_min, data = df_subjects) %>%
  gf_vline(xintercept = time.stats["online",]$mean, color = "black") %>% 
  gf_fitdistr(color="red")+
  labs(title="Online")

plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)

annotate_figure(plot, top = text_grob("Total Time by Study",
   color = "black", face = "bold", size = 14))

qqPlot(df_lab$tri_min)

## [1] 67 41
qqPlot(df_online$tri_min)

## [1] 107 120



However, the distribution is clearly not normal, so it is appropriate to transform the response latency variable.

TRANSFORM Response Latency

#Apply a log transform
df_lab$log_time <- log(df_lab$tri_min)
df_online$log_time <- log(df_online$tri_min)

#Calculate log transform stats
log_time.stats <- rbind(
  "lab" = favstats(~log_time, data = df_lab),
  "online" = favstats(~log_time, data = df_online))
log_time.stats

#Visualize log transformed data 
plab <- gf_dhistogram(~log_time, data = df_lab) %>%
gf_vline(xintercept = ~log_time.stats['lab',]$mean, color = "black") %>%
gf_fitdistr(color="blue") %>% 
gf_labs(title ="response latency (LOGT) (all questions)", x="Log-transform (minutes)")

ponline <- gf_dhistogram(~log_time, data = df_online) %>%
gf_vline(xintercept = ~log_time.stats['lab',]$mean, color = "black") %>%
gf_fitdistr(color="blue") %>% 
gf_labs(title ="response latency (LOGT) (all questions)", x="Log-transform (minutes)")

plot <-ggarrange(plab, ponline, common.legend = TRUE, nrow = 1, ncol =2)

annotate_figure(plot, top = text_grob("Log-Transformed Time by Study",
   color = "black", face = "bold", size = 14))

#VERIFY normality of resulting data with qqPlot
qqPlot(~log_time, data = df_lab)

qqPlot(~log_time, data = df_online)


TODO: The fit appears to be much better than before. Is this good enough?.

HYPOTHESIS TESTING

Response Accuracy by Condition

The experimental hypothesis (H1) is that structuring the data to pose an impasse (condition 121) will produce significantly better performance than non-impasse (condition 111). The null hypothesis (H0) is that there will be no difference in performance between conditions.

[EXPLORE]

#DESCRIBE scores by condition
score.cond.stats <- rbind(
  "lab" = favstats(triangular_score ~ condition, data = df_lab),
  "online" = favstats(triangular_score ~ condition, data = df_online)
)
score.cond.stats
##          condition min Q1 median    Q3 max     mean       sd  n missing
## lab.1          111   1  2    2.0  3.75  15 4.500000 4.643875 62       0
## lab.2          121   1  2    6.5 12.00  15 7.078125 4.828174 64       0
## online.1       111   1  2    2.0  3.00  15 3.897059 3.996789 68       0
## online.2       121   1  2    3.0 11.25  15 6.055556 5.156396 72       0

For in person study, participants in the impasse group had (on average) higher scores (M = 7.08 SD = 4.83) than those in the non-impasse control group (M = 4.5, SD = 4.64).

For online replication study, participants in the impasse group had (on average) higher scores (M = 6.06 SD = 5.16) than those in the non-impasse control group (M = 3.9, SD = 4).

#VISUALIZE scores by condition
condlables <- c("111"="control", "121"="impasse")
plab <- gf_dhistogram( ~triangular_score, fill= ~condition, data = df_lab) %>% 
  gf_facet_grid(condition~., labeller=labeller(condition=condlables)) %>% 
  gf_vline(xintercept = ~mean, data = score.cond.stats[c(1:2),], color = "blue")+
  labs(title="In Person")

ponline <- gf_dhistogram( ~triangular_score, fill= ~condition, data = df_online) %>% 
  gf_facet_grid(condition~., labeller=labeller(condition=condlables)) %>% 
  gf_vline(xintercept = ~mean, data = score.cond.stats[c(3:4),], color = "blue")+
  labs(title="Online")

plot <-ggarrange(plab, ponline, legend = FALSE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Condition",
   color = "black", face = "bold", size = 14))

#VISUALIZE scores by condition
plab <- gf_boxplot(triangular_score ~ condition, data=df_lab) %>% 
  gf_jitter(color=~condition, alpha=0.5) +
  labs (title = "In Person")

ponline <- gf_boxplot(triangular_score ~ condition, data = df_online) %>% 
  gf_jitter(color=~condition, alpha=0.5)+
  labs(title ="Online")

plot <-ggarrange(plab, ponline, legend = FALSE, nrow = 1, ncol =2)
annotate_figure(plot, top = text_grob("Score Accuracy by Condition",
   color = "black", face = "bold", size = 14))

[LINEAR MODEL]

#MODEL triangular score by condition with a linear model
m1 = lm(triangular_score ~ condition, data = df_lab)
summary(m1)
## 
## Call:
## lm(formula = triangular_score ~ condition, data = df_lab)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.078 -3.078 -2.500  3.922 10.500 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.5000     0.6018   7.478  1.2e-11 ***
## condition121   2.5781     0.8444   3.053  0.00277 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.738 on 124 degrees of freedom
## Multiple R-squared:  0.06993,    Adjusted R-squared:  0.06243 
## F-statistic: 9.323 on 1 and 124 DF,  p-value: 0.00277
#the first co-efficient represents the model estimate for the first group, the second, the difference for the second group

# partition variance with anova
# anova(m1)
# supernova(m1) #runs but doesn't knit?

#calculate effect size
eta_squared(m1, partial = FALSE)
## # Effect Size for ANOVA (Type I)
## 
## Parameter | Eta2 |       95% CI
## -------------------------------
## condition | 0.07 | [0.01, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).
#plot residuals
#produce residual vs. fitted plot
plot(fitted(m1), resid(m1))
abline(0,0) #add a horizontal line at 0

plot(density(resid(m1)))

For in person, a linear model predicting triangular score accuracy by condition explains approximately 7% of the variance in triangular score F(1,124) = 9.32, p < 0.05) TODO: Do these residuals look ok?.

#MODEL triangular score by condition with a linear model
m2 = lm(triangular_score ~ condition, data = df_online)
summary(m2)
## 
## Call:
## lm(formula = triangular_score ~ condition, data = df_online)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.056 -3.056 -1.897  2.103 11.103 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.8971     0.5614   6.941 1.39e-10 ***
## condition121   2.1585     0.7829   2.757  0.00662 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.63 on 138 degrees of freedom
## Multiple R-squared:  0.05221,    Adjusted R-squared:  0.04534 
## F-statistic: 7.601 on 1 and 138 DF,  p-value: 0.006622
#the first co-efficient represents the model estimate for the first group, the second, the difference for the second group

# partition variance with anova
# anova(m1)
# supernova(m2) #runs but doesn't knit?

#calculate effect size
eta_squared(m2, partial = FALSE)
## # Effect Size for ANOVA (Type I)
## 
## Parameter | Eta2 |       95% CI
## -------------------------------
## condition | 0.05 | [0.01, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).
#plot residuals
#produce residual vs. fitted plot
plot(fitted(m2), resid(m2))
abline(0,0) #add a horizontal line at 0

plot(density(resid(m2)))

For online replication, a linear model predicting triangular score accuracy by condition explains approximately 5% of the variance in triangular score F(1,138) = 7.60, p < 0.05) TODO: Do these residuals look ok?.

POST-HOC EXPLORATION

1. Response Accuracy

1.1 Learning VS Testing

In SGC3A, participants are provided with 15 questions. For the first 5 questions, they experience a ‘scaffolded’ version the stimuli (control condition = 111, impasse scaffold = 121).

What if we only consider the final 10 test items? Assume students have 5 questions worth of scaffolding to figure out the graph for themselves, and we only assess accuracy for the final ten items? Is our experimental hypothesis supported?

#DESCRIBE scores by condition
test.stats <- favstats(tt_n ~ condition, data = df_subjects)

#VISUALIZE scores by condition
gf_dhistogram( ~tt_n, fill= ~condition, data = df_subjects) %>% 
  gf_facet_grid(condition~.) %>% 
  gf_vline(xintercept = ~mean, data = test.stats, color = "blue") 

#VISUALIZE scores by condition
gf_boxplot(tt_n ~ condition, data=df_subjects) %>% 
  gf_jitter(color=~condition, alpha=0.5)

mTest = lm(tt_n ~ condition, data = df_subjects)
summary(mTest)
## 
## Call:
## lm(formula = tt_n ~ condition, data = df_subjects)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.654 -2.654 -1.277  3.346  6.723 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.2769     0.2673  12.258  < 2e-16 ***
## condition121   1.3775     0.3739   3.684 0.000278 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.048 on 264 degrees of freedom
## Multiple R-squared:  0.0489, Adjusted R-squared:  0.0453 
## F-statistic: 13.57 on 1 and 264 DF,  p-value: 0.0002782
anova(mTest)
## Analysis of Variance Table
## 
## Response: tt_n
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## condition   1  126.12 126.118  13.574 0.0002782 ***
## Residuals 264 2452.79   9.291                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# supernova(mTest)

A linear regression model predicting score (test blocks only) from condition explained 5% of variance, F(1,124) = 7.17, p < 0.05.

2. Response Latency

#VISUALIZE distribution of response time
condlables <- c("111"="control", "121"="impasse")
gfall <- gf_dhistogram(~tri_min, data = df_fall) %>%
  gf_vline(xintercept = time.stats['fall',]$mean, color = "red")+
  facet_grid(~condition, labeller = labeller(condition=condlables))+
  labs(title="Fall", x="")

gspring <- gf_dhistogram(~tri_min, data = df_spring) %>%
  gf_vline(xintercept = time.stats['spring',]$mean, color = "red") +
  facet_grid(~condition, labeller = labeller(condition=condlables))+
  labs(title="Spring", x="")

gonline <- gf_dhistogram(~tri_min, data = df_online) %>%
  gf_vline(xintercept = time.stats['online',]$mean, color = "red") +
  facet_grid(~condition, labeller = labeller(condition=condlables))+
  labs(title="Online")

plot <-   ggarrange(gfall, gspring, gonline, common.legend = TRUE, ncol = 1, nrow =3)

  annotate_figure(plot, top = text_grob("Response Time by condition and term", 
   color = "red", face = "bold", size = 14))

### 2.1 Response Latency vs. Accuracy

Are response time (on the triangular blocks (test and learn)) and response accuracy correlated?

#VISUALIZE relationship between response time and accuracy
gf_jitter(tri_min ~ triangular_score, data = df_subjects) %>% 
  gf_lm()

#VISUALIZE relationship between response time and accuracy by condition
gf_jitter(tri_min ~ triangular_score, color = ~condition, data = df_subjects) %>% 
  gf_facet_grid(~condition) %>% 
  gf_lm()


From visual inspection, it does not appear that score and time and strongly correlated.

# 
# #simple linear model predicting triangular score from triangular time
# m1 = lm(triangular_score ~ tri_min, data = df_subjects)
# summary(m1)
# cor.test(df_subjects$triangular_score, df_subjects$tri_min)
# 
# #linear model with log-transformed triangular time
# mT = lm(triangular_score ~ log(tri_min), data = df_subjects)
# summary(mT)
# cor.test(df_subjects$triangular_score, df_subjects$log_time)

However, a simple linear regression indicates a significant (though small) negative correlation between score accuracy and response time R = -0.18, t(124) = -2.04, p < 0.05.

2.2 Response Latency by Condition

Do response times differ by condition?

time.stats <- favstats(tri_min ~ condition, data = df_subjects)
time.stats
##   condition      min       Q1   median       Q3      max     mean       sd   n
## 1       111 3.146717 7.658642 9.316917 11.23459 27.87285 9.996164 3.991173 130
## 2       121 1.983417 6.868392 8.363025 10.93544 26.13407 9.233849 3.690071 136
##   missing
## 1       0
## 2       0
gf_dhistogram(~tri_min, fill = ~condition, data = df_subjects) %>% 
  gf_facet_grid(condition~.)

The average total response latency (entire 15 question block) for the control condition was slightly higher (M = 9.54s, SD = 3.09) than the impasse-scaffold condition (M = 8.9, 2.69s)

#simple linear model
m1 <- lm(tri_min ~ condition, data = df_subjects)
summary(m1)
## 
## Call:
## lm(formula = tri_min ~ condition, data = df_subjects)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2504 -2.3535 -0.7315  1.6319 17.8767 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.9962     0.3368  29.680   <2e-16 ***
## condition121  -0.7623     0.4710  -1.618    0.107    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.84 on 264 degrees of freedom
## Multiple R-squared:  0.009824,   Adjusted R-squared:  0.006073 
## F-statistic: 2.619 on 1 and 264 DF,  p-value: 0.1068
#linear model on log-transformed data 
mT <- lm(log(tri_min) ~ condition, data = df_subjects)
summary(mT)
## 
## Call:
## lm(formula = log(tri_min) ~ condition, data = df_subjects)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.46400 -0.21203 -0.00751  0.23148  1.11442 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.23356    0.03335  66.964   <2e-16 ***
## condition121 -0.08474    0.04665  -1.817   0.0704 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3803 on 264 degrees of freedom
## Multiple R-squared:  0.01234,    Adjusted R-squared:  0.008603 
## F-statistic:   3.3 on 1 and 264 DF,  p-value: 0.07043

The difference in average response time is not statistically significant F(1,124) = 1.28 , p > 0.05, with a linear model predicting response time from condition explaining only 1% of variance.

2.3 Is Response Latency predictive?

Does adding response time to the model help predict score accuracy?

#simple  model predicting triangular score from triangular time
m1 = lm(triangular_score ~ condition, data = df_subjects)
summary(m1)
## 
## Call:
## lm(formula = triangular_score ~ condition, data = df_subjects)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.537 -3.537 -2.185  3.463 10.815 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.1846     0.4107  10.189  < 2e-16 ***
## condition121   2.3521     0.5744   4.095 5.62e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.683 on 264 degrees of freedom
## Multiple R-squared:  0.05972,    Adjusted R-squared:  0.05616 
## F-statistic: 16.77 on 1 and 264 DF,  p-value: 5.618e-05
anova(m1) # supernova(m1)
## Analysis of Variance Table
## 
## Response: triangular_score
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## condition   1  367.7  367.73  16.769 5.618e-05 ***
## Residuals 264 5789.4   21.93                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#model including condition and triangular time
m2 = lm(triangular_score ~ condition + tri_min, data = df_subjects)
summary(m2)
## 
## Call:
## lm(formula = triangular_score ~ condition + tri_min, data = df_subjects)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.504 -3.253 -2.035  3.514 11.465 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.61541    0.85100   6.599 2.27e-10 ***
## condition121  2.24304    0.57434   3.905  0.00012 ***
## tri_min      -0.14313    0.07468  -1.917  0.05635 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.659 on 263 degrees of freedom
## Multiple R-squared:  0.07268,    Adjusted R-squared:  0.06563 
## F-statistic: 10.31 on 2 and 263 DF,  p-value: 4.907e-05
anova(m2) # supernova(m2)
## Analysis of Variance Table
## 
## Response: triangular_score
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## condition   1  367.7  367.73  16.939 5.171e-05 ***
## tri_min     1   79.8   79.76   3.674   0.05635 .  
## Residuals 263 5709.6   21.71                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#model including condition and triangular time (log-transformed)
mT = lm(triangular_score ~ condition + log(tri_min), data = df_subjects)
summary(mT)
## 
## Call:
## lm(formula = triangular_score ~ condition + log(tri_min), data = df_subjects)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.975 -3.258 -2.017  3.583 11.329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.7732     1.7374   3.898 0.000123 ***
## condition121   2.2539     0.5765   3.910 0.000118 ***
## log(tri_min)  -1.1589     0.7559  -1.533 0.126442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.671 on 263 degrees of freedom
## Multiple R-squared:  0.06805,    Adjusted R-squared:  0.06097 
## F-statistic: 9.603 on 2 and 263 DF,  p-value: 9.438e-05
anova(mT) # supernova(m2)
## Analysis of Variance Table
## 
## Response: triangular_score
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## condition      1  367.7  367.73 16.8546 5.39e-05 ***
## log(tri_min)   1   51.3   51.28  2.3505   0.1264    
## Residuals    263 5738.1   21.82                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model predicting accuracy with condition and response latency explains 9% variance, while the model with only accuracy explains 7% variance, however, the response latency predictor does not reach significance at the 5% alpha level.

TODO: What is the appropriate inference to draw from this model comparison?.

2.4 Response Latency on Learn (v) Test

#DESCRIBE distribution of learning block times scores
learn.time.stats <- favstats(df_subjects$learn_min)
learn.time.stats
##        min       Q1   median     Q3      max    mean       sd   n missing
##  0.6142333 2.353563 3.325692 4.6456 12.96385 3.78128 2.046571 266       0
#VISUALIZE distribution of learning block time
gf_dhistogram(~learn_min, data = df_subjects) %>%
  gf_vline(xintercept = mean(df_subjects$learn_min), color = "red") %>% 
  gf_labs(title = "Distribution of LEARN block response time ")

Total learning block times ranged from 0.61 min to 12.96, with M = 3.78, SD = 2.05.

#DESCRIBE distribution of learning block times scores
test.time.stats <- favstats(df_subjects$test_min)
test.time.stats
##       min       Q1  median       Q3      max     mean     sd   n missing
##  1.369183 4.250808 5.42165 6.454271 18.86497 5.825129 2.5289 266       0
#VISUALIZE distribution of learning block time
gf_dhistogram(~test_min, data = df_subjects) %>%
  gf_vline(xintercept = mean(df_subjects$test_min), color = "red") %>% 
  gf_labs(title = "Distribution of TEST block response time ")

Total testing block times ranged from 1.37 min to 18.86, with M = 5.83, SD = 2.53.

Is time spent on learning (v) testing blocks correlated?

#VISUALIZE test v learning block time
gf_point(test_min ~ learn_min, color = ~condition, data = df_subjects) %>% 
  gf_facet_grid(~condition) %>% 
  gf_lm()

m1 <- lm( test_min ~ learn_min , data = df_subjects)
summary(m1)
## 
## Call:
## lm(formula = test_min ~ learn_min, data = df_subjects)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5047 -1.3944 -0.3009  0.8506 11.2802 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.90532    0.29798  13.106  < 2e-16 ***
## learn_min    0.50771    0.06933   7.323 2.94e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.31 on 264 degrees of freedom
## Multiple R-squared:  0.1688, Adjusted R-squared:  0.1657 
## F-statistic: 53.62 on 1 and 264 DF,  p-value: 2.944e-12
anova(m1) #supernova(m1)
## Analysis of Variance Table
## 
## Response: test_min
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## learn_min   1  286.11 286.114  53.622 2.944e-12 ***
## Residuals 264 1408.65   5.336                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cor.test(df_subjects$test_min, df_subjects$learn_min)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 7.3227, df = 264, p-value = 2.944e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3057160 0.5061395
## sample estimates:
##       cor 
## 0.4108799

A linear model predicting test block time from learning block time reveals that learning block time explains 12% variance in testing block time (F(1,124) = 17.1, p < 0.001), with an 18s increase in testing time for every 1 minute increase in learning time. The two variables have a small, significant correlation r = 0.35, t(124) = 4.13, p < 0.001, 95% CI [0.18, 0.49].

DUE DILLIGENCE

1. Power Analysis

What is the appropriate sample size to detect a moderate-sized effect (f = 0.25) at a 0.05 alpha level?

#K = 2 groups, f = 0.25 is moderate effect size
pwr.anova.test(k=2,f=0.25 ,sig.level=.05, power=.8)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 2
##               n = 63.76561
##               f = 0.25
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: n is number in each group

The studies should aim to have at least 60 subjects to detect a moderate sized main effect between two conditions.

2. TERM-LEVEL analysis

Data from the first term only (Fall 2017) were analyzed and presented as a paper at CogSci 2019, in support of the experimental hypothesis (H1: impasse is an effective scaffold). Combined across terms, the prior analyses ALSO support the experimental hypothesis. Do the spring-only data also support the H1 hypothesis?

#simple linear model predicting triangle score from condition for spring data only
mspring = lm(triangular_score ~ condition, data = df_subjects %>% filter(term=='fall21'))

#partition variance
anova(mspring)# supernova(m2)
## Analysis of Variance Table
## 
## Response: triangular_score
##            Df  Sum Sq Mean Sq F value   Pr(>F)   
## condition   1  162.94 162.936  7.6013 0.006622 **
## Residuals 138 2958.06  21.435                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(mspring)
##                  2.5 %   97.5 %
## (Intercept)  2.7869064 5.007211
## condition121 0.6104631 3.706530

A linear model predicting triangular_score from conditio for only data collected in the Spring term does not reveal a significant difference between conditions. However, the previous power analysis suggests that this test may be underpowered, as it includes data from only 72 subjects (total), instead of the recommended 120 (60 / group). It seems more appropriate to combine data from fall and Spring.

3. Combining Fall and Spring

#VISUALIZE scores by condition
gf_dhistogram( ~triangular_score, fill= ~condition, data = df_subjects) %>%
  gf_facet_grid(condition~term) 

# 
# #VISUALIZE scores by condition
gf_boxplot(triangular_score ~ condition, data=df_subjects) %>%
  gf_facet_grid(~term) %>% 
  gf_jitter(color=~condition, alpha=0.5)

By visual inspection, the distribution of accuracy scores across condition appear similiar across terms.

It is therefore reasonable to conclude that data collected in Spring 2018 can be combined with Fall 2017.

Distribution of Score Variable

The accuracy score variable is not normally distributed. Should we be modelling these data with an alternative distribution (such as Poisson, for count data)?

library(fitdistrplus)

plotdist(df_fall$triangular_score, histo = TRUE, demp = TRUE)

fit_n  <- fitdist(df_fall$triangular_score, "norm")
fit_p  <- fitdist(df_fall$triangular_score, "pois")
summary(fit_n)
## Fitting of the distribution ' norm ' by maximum likelihood 
## Parameters : 
##      estimate Std. Error
## mean 5.759259  0.6707361
## sd   4.928884  0.4742820
## Loglikelihood:  -162.7588   AIC:  329.5175   BIC:  333.4955 
## Correlation matrix:
##               mean            sd
## mean  1.000000e+00 -2.260365e-09
## sd   -2.260365e-09  1.000000e+00
summary(fit_p)
## Fitting of the distribution ' pois ' by maximum likelihood 
## Parameters : 
##        estimate Std. Error
## lambda 5.759259  0.3265776
## Loglikelihood:  -195.1632   AIC:  392.3264   BIC:  394.3154
gofstat(list(fit_n, fit_p), fitnames = c("norm", "pois"))
## Goodness-of-fit statistics
##                                  norm       pois
## Kolmogorov-Smirnov statistic 0.286271  0.4263866
## Cramer-von Mises statistic   1.040138  2.5698714
## Anderson-Darling statistic   5.824593 27.2235450
## 
## Goodness-of-fit criteria
##                                    norm     pois
## Akaike's Information Criterion 329.5175 392.3264
## Bayesian Information Criterion 333.4955 394.3154
par(mfrow=c(2,2))
plot.legend <- c("normal", "poisson")
denscomp(list(fit_n, fit_p), legendtext = plot.legend)
cdfcomp (list(fit_n, fit_p), legendtext = plot.legend)
qqcomp  (list(fit_n, fit_p), legendtext = plot.legend)
ppcomp  (list(fit_n, fit_p), legendtext = plot.legend)

[NON LINEAR MODEL]

Try a poisson distribution, and a poisson regression

m2 = glm(triangular_score ~ condition, family = poisson(link = "log"),data = df_subjects)
summary(m2)
## 
## Call:
## glm(formula = triangular_score ~ condition, family = poisson(link = "log"), 
##     data = df_subjects)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.705  -1.549  -1.190   1.256   4.083  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.43141    0.04287  33.386  < 2e-16 ***
## condition121  0.44603    0.05443   8.194 2.53e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1021.5  on 265  degrees of freedom
## Residual deviance:  952.5  on 264  degrees of freedom
## AIC: 1809.5
## 
## Number of Fisher Scoring iterations: 5
#plot residuals
plot(fitted(m2), resid(m2))
abline(0,0) #add a horizontal line at 0

plot(density(resid(m2)))

m3 = glm(triangular_score ~ condition, family = quasipoisson(link = "log"),data = df_subjects)
summary(m3)
## 
## Call:
## glm(formula = triangular_score ~ condition, family = quasipoisson(link = "log"), 
##     data = df_subjects)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.705  -1.549  -1.190   1.256   4.083  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.43141    0.08719  16.417  < 2e-16 ***
## condition121  0.44603    0.11070   4.029 7.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 4.13573)
## 
##     Null deviance: 1021.5  on 265  degrees of freedom
## Residual deviance:  952.5  on 264  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5
#plot residuals
plot(fitted(m3), resid(m3))
abline(0,0) #add a horizontal line at 0

plot(density(resid(m3)))